Border security is an essential aspect to the national welfare of any country, especially in modern times. In recent election in the United States, our borders with Canada and Mexico have been a ``hot topic” during discussions on immigration policy and national defense. Understanding the our border and the different types of crossings–from everyday commuters, economic activity, or tourism–is crucial to help our leaders make wise policy decisions and being a better global citizen.
During our exploration of US border crossings data, we investigated the differences between the Mexican and Canadian border, the temporal aspects of border crossings, how different ports compare to each other types of crossings, and the effects of major events such as September 11th and COVID-19. Our complete explorations of these data are found on GitHub at https://github.com/rlscott/borderxing.
Our data are from the Bureau of Transportation Statistics, which is part of the US Department of Transportation. Our border crossing data was collected at ports of entry by US Customs and Border Protection (CBT). The data include a monthly count of entries into the United States, broken down in to categories of vehicles, containers, passengers or pedestrians. The data cover the years 1996 to 2023, beginning in April 1996 and ending in September 2023.
The data were found on this Bureau of Transportation Statistics website: https://data.bts.gov/stories/s/jswi-2e7b.
The data includes 386,549 entries, where each entry represents a monthly crossing count of a particular category at a port of entry. The columns in our data set include the port name, state, port code, border (US-Mexico or US-Canada), date, measure, value, latitude, and longitude. There are 12 different types of measures or categories in our data: bus passengers, buses, pedestrians, personal vehicle passengers, personal vehicles, rail containers empty, rail containers loaded, train passengers, trains, truck containers empty, truck containers loaded, and trucks. The value column contains the monthly count of that paticular measure.
We made a few modifications to our data to make it more tidy. Fortunately our data was already in long-format, which made analysis much easier. First, we used the lubridate package to create month and year columns in our data (instead of only a month-year date column). We then further cleaned our data wtih tools in the dplyr package. We set the port name, state, port code, border, month, and measure columns to be factors. We created an indicator column, called type, with two levels: object and people to better distinguish between different types of traffic. We found this necessary because merely taking the sum of all crossings without taking into account the different types of measures would result in some double-counting and sort of weighted aggregate. This is because a personal vehicle with three passengers would be counted as a total of four crossings, whereas three pedestrians would only count as three total crossings when taking the total crossings at that port. Differentiating between people and object solves this double-counting problem in our analysis.
Additionally, we also cleaned our data by distinguishing between two ports on the US-Canada border with the same name, Eastport. We noticed that while there are 118 unique port codes, there were only 117 unique port names. Further investigation revealed an Eastport in Maine and another in Idaho. In our data, we renamed these two ports to be ‘Eastport ME’ and ‘Eastport ID’.
All of these changes while cleaning our border data resulted in the following structure:
With the dplyr package, we found 118 unique ports of entry in our data. 90 of these ports are on the US-Canada border, and 28 of these ports are on the US-Mexico Border.
Figure 2.1 shows the location of each border crossing station mapped by latitude and longitude. This interactive map was created using the leaflet package. Ankoridge is not included in this map since its coordinates are missing from the data (more about this port will be discussed later).
Figure 2.1: Map of the 118 US Border Ports of Entry
As shown in Figure 2.2, North Dakota and Washington have the most ports of entry for the US-Canadian border and Texas has the most ports of entry for the US-Mexico border. There are 10 states on the US-Canada border and 4 states on the US-Mexico border.
Figure 2.2: Count of Ports of Entry by State, Colored by Border
We also investigated the multinational distributions of measure counts for each port. This was done by finding the two median values for all twelve measures for ports on the US-Canada border and ports on the US-Mexico border. We computed the proportion of the median measure counts, thus creating a multinational distribution with 12 outcomes. Next, we found the multinational distribution for each of the 118 ports of entry. To find ports that significantly deviate in measure proportions from the median ports (for their respective border), we used a \(\chi^2\) goodness of fit test with Monte Carlo simulation. Proportions were used instead of the actual counts was because some ports have significantly more traffic than other. Since the \(\chi^2\) test statistic is a summation involving the observed and expected values, ports with more traffic would naturally have a higher test statistic than ports with low traffic that significantly differ from the median distribution. Including Monte Carlo simulation helped improve the accuracy of our test, since percentages are extreamly small values. Due to the approximations, we decided to ignore p-values and instead rank the ports by test statistic, resulting in Figure 2.3.
Figure 2.3: Multinomail distribtions of each port, ranked by test statistic
Figure 2.3 shocases the multinational distributions for each port, and they are arranged by having the port with the largest test statistic at the top. Labels were removed for some of the ports to prevent over-plotting. The Canada (1) pane contains the ports with the highest test statistic on the US-Canada border. Interestingly, pedestrians, personal vehicles, and personal vehicle passengers all have the highest overal counts for most of the ports. For the US-Mexico border, Cross Border Xpress and Boquillas have the greatest deviation from the median distribution, with a large majority of crossings being pedestrians. Upon further investigation, Cross Border Xpress is a pedestrian bridge between two airports in San Diego and Tijuana. The distribution of El Passo is closest to the US-Mexico border median distribution. For the US-Canada border, the ports Anchorage and Skagway have the greatest deviation from the US-Canada border median distribution, with Highgate Springs having the least deviation. Also, ports on the US-Mexico border have more pedestrians than ports on the US-Canada border, which can be plausably explained by climate differences.
We have identified outliers in the scatter plot related to Buses Transportation in Figure 2.4. Notably, in November 1999, Port Sweetgrass recorded a significantly higher count of 1375 instances of bus transportation. Conversely, in June 2021, Port Limestone reported a comparatively lower count of 173 instances of bus transportation.
Figure 2.4: Numbers of Crossing of Object by State
We conducted an analysis of the total object transportation for the year 1999 in Montana and the year 2021 in Maine. Upon pinpointing specific months within each year in Figure 2.5, it becomes evident that these data points do not align with the peak of the plot. This observation suggests that bus transportation may serve as an alternative choice, particularly when other modes of transportation prove impractical for individuals crossing the border.
Figure 2.5: Plot of Transporation_via_object by Month
In order to validate our hypothesis, we conducted an examination of Bus Passengers transportation across states, revealing six outliers in Figure 2.6. These outliers are as follows:
These instances, deviating significantly from the norm, warrant further investigation and may provide insights into the dynamics of bus transportation across state borders.
Figure 2.6: Numbers of Crossing of People by State
Upon isolating the specific months within the annual plot of total transportation via people in Figure 2.7, we observed that the identified outliers in both Buses and Bus Passengers transportation did not coincide with the peak of the plot. This outcome substantiates our initial assumption that bus transportation serves as an alternative choice, particularly when conventional transportation methods prove impractical for individuals crossing the border. Further analysis indicates that these outliers are not attributed to significant events, reinforcing the notion that the observed deviations in Buses/Bus Passengers transportation are likely driven by individual travel choices rather than external factors.
Figure 2.7: Plot of Transporation via People by Month
Subsequently, we aggregated data into seven-year intervals, examining the average transportation volumes for both objects and people in Figure 2.8. Our analysis revealed a pronounced seasonal trend, with peak transportation occurring during the summer and a corresponding underestimation during the winter months.
Additionally, we observed that between 1996 and 2002, the average traffic for both objects and people reached its zenith. Over the subsequent 14 years, a discernible translational decline was noted, followed by a resurgence in traffic from 2017 to 2023.
Figure 2.8: Plot of Average Crossing in each 7 years
Figure 2.9: Sum of people crossing the US and Cannadian border over time. The red line indicates 9/11, gold indicates when Donanld Trump was elected, and purple indicates the start of Covid 19.
Figure 2.9 displays the aggregate pattern from January 1996 to September 2023 for both the Mexican and Canadian border. An important note to remember is that this is the total number of individuals that have crossed the border as opposed to objects. The red lines are potentially significant historical events that may explain certain patterns in the graph.
We can see easily that Covid 19 had a large impact on both Canadian and Mexican border crossings. This is unsurprising since many institutions shut down and border crossing was severely limited. The Presidency of Donald Trump in contrast seemed to have very little if any effect on either border, despite his heavy involvement with the Mexican Border. 9/11 however seemed to have a much more dynamic effect. It has an obvious lasting reduction in Mexican border crossings, yet Canada had nearly now effect from this event. This could be because of any number of reasons, xenophobia not last on the list, however discerning that cause is not possible with the current data.
The story of 9/11 gets even more interesting though on the Mexican border. Two of the largest ports, Calexico in California and El Paso in Texas have a particularly distinguished change in total crossings at the 9/11 timestamp seen in Figure 2.10.Figure 2.10: Sum of people crossing the Mexican border ports Calexico and El Paso over time. The red line indicates 9/11
Figure 2.11: Sum of people crossing the US and Cannadian border over time. The red line indicates 9/11
Since these ports are so large, it is reasonable to believe they exert a lot of control over the overall pattern in Figure 2.9. Thus we see a significantly reduced effect of 9/11 when taking out these two ports shown by the blue line in Figure 2.11. In fact the lasting reduction in border crossings has been almost entirely eliminated, and all that remains is a short dip. This dip seems to be a result of a few different ports, and not as easily explained away. One large example of a port having a short dip is port San Ysidro in California. Just a few ports explain the majority of the difference between pre 9/11 and post 9/11 border crossings implies that an investigation into those two ports could be the most effective way of determining the reason why 9/11 had such a large effect on the Mexican border while having small effect on the Canadian one.
It is also worth noting that there are other smaller ports that exhibit the same phenomenon as El Paso and Calexico. However they are so small that their addition to the list of removed ports barely changes pattern at all. One example of this is port Roma in Texas. This means that using Calexico and El Paso to explain the 9/11 effect should not be treated as comprehensive, but as the dominant locations of the effect.
Figure 2.9 shows the value of the Canada border and Mexico border vs time. We utilize the ‘additive’ method to decompose the population data at the Canada border and Mexico border , dividing it into three components: trend, season, and random parts. Our focus for this project lies in analyzing the seasonal component. We are interested in the seasonal part.
Figure 2.12: Comparison between the seasonal components of two borders.
Observing the Figure 2.12, it’s evident that the seasonal patterns of Mexico and Canada differ significantly. The observed differences in seasonal part between Mexico and Canada border crossings may caused by various factors, including cultural, economic, and climatic influences. The distinct peaks in border crossing values during different months suggest varying trends in travel behavior and trade activities between the two countries. Mexico’s peak occurs in November, whereas Canada’s peak is in May. This discrepancy could potentially be attributed to weather effects; May might be a comfortable month for people to travel or conduct business through the Canada border, whereas November remains warm along the Mexico border.
Figure 2.13: Impact of the 9/11 event on the Mexico and Canada borders. The darkblue line represents the value of September in the year before the event, while the olivegreen line represents the value of September in the year after the event.
As observed in the Figure 2.13, the 9/11 event had a noticeable impact on the value of both borders. There was a significant decline in value on September 11th, 2001, followed by a subsequent recovery the following year.
Figure 2.14: The impact of the COVID-19 event on the Mexico and Canada borders. The darkblue line represents the value of March in the year prior to the event, while the olivegreen line represents the value of March in the subsequent year.
From the Figure 2.14, it’s evident that COVID-19 had a considerable impact on the values of both borders, notably declining sharply in March 2020. This event not only affected the immediate values but also influenced the seasonal patterns of each border. The values remained consistently low in the subsequent periods following the initial decline.
We selected the Mexico and Canada border data, filtering by the ‘Type’ attribute equal to ‘People.’ From this dataset, we identified the top 10 largest values and extracted their corresponding ‘Port_name.’ The findings revealed that for the Mexico border, the predominant port is El Paso, while for the Canada border, the prominent ports are Buffalo Niagara Falls and Detroit. As these three ports exhibit significant interest, we will conduct further research on them.
Figure 2.15: Comparison object
In Figure 2.15 we compare the three largest ports identified earlier, there are notable differences emerge between Mexico and Canada. It appears that individuals crossing the Mexico border tend to prefer personal vehicles, whereas those crossing the Canada border favor public transportation. Additionally, we observed that the value of empty rail and truck crossings is higher at the Mexico border than at the Canada border, indicating a lower utilization rate of traffic infrastructure in Mexico. Addressing this discrepancy is crucial when considering environmental impact and energy efficiency.
According to the Bureau of Transportation Statistics FAQ on border crossing data, there are no passenger trains between Mexico and the United States. For freight trains, crews are changed at the US-Mexico Border. This means there should be no non-zero entries for train passengers in our data. Howwever, there are are 1481 non-zero entries for train passengers, ranging from the years 1996 to 2023 with most of them being in California. This inconsistancy needs to be further investigated.
Most ports of entry have over 100 entries in our data set, but three ports have less than 100: Algonac, Anchorage, and Cross Border Xpress. For Cross Border Xpress, this makes sense since it was recently opened in 2015. Interesting, Anchorage only has one entry and is quite distant from the US-Canadian border in Alaska. It only lists empty shipping containers in September 2023.
Another concern with our data is the zero entries. We wonder if they are actually zeross or NA values. ASometimes ports are closed (especially seasonally during winter for the US-Canada Border) but we know that certain high-traffic ports with lots of commuters probably still had crossings during months where zeros are rerecorded. For example, during the year 1996, San Ysidro (a major commuting port on the US-Mexico border) has a personal vehicle passenger count of zero for every month. For the purposes of our analysis, we treated the zero entries as zeros and not as NA values.
Overall, through our analysis we gleamed that the US-Canada appears to be very cyclic, whereas the US-Mexico border appears to be very volatile and possibly subject to international events and some US foreign policy. The time series analysis showcases that major events had different but similar effects on both borders.
One possible venture for future work is distinguishing between commercial and private traffic. There are some issues with it possibly being confounded in the data set, such as passenger trains and freight trains both being counted as ‘trains’. However, we may be able to investigate the flow of imports and exports by comparing empty truck and rail containers (exports) with loaded truck and rail containers.